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SYSTEM AND METOOD FOR EFHCIENTLY FINDING NEAR-SIMILAR 
IMAGES IN MASSIVE DATABASES 



BACKGROUND OF THE INVENTION 

Multimedia database image retrieval tecbniques are laiown which attempt to find 
5 and retrieve a matching image or matching images fiom a database of stored multimedia 
images. Such retrieval techniques are becoming commonplace with the proliferation of 
infbnnatioR disseminated and available over computer networks sudi as the Internet 
Massive amounts of multimedia data are stored in databases supporting wab pages and 
servers, including text^ gcaphics^ video and audio. Searching and finding matchmg 
10 multimedia images can be time and computationally intensive. 

Queries employed to find matching images typically coniputc statistics for die 
image and compare statistics to a database of statistics from potential ma^^^ 
Alternatively, die image is subdivided into regions and statistics computed for each 
region. The statistics are combined into a vector quantity in a high-dimensional space, 
15 and comparison between two images involves computing, for ejcample, the Euclidean ^ 
distance between the vectors to determine similarity. Vectors which are ^'neai'' to each 
other correspond to images which are similar. 

In the case of images, however, tyjrical prior art techniques tend to check every 
image in the database for similarity, a process which is very slow for large data sets. 
20 Indexing techmques such as a K-d tree may be used to augment the search, but 
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frequently fail to eifectively restrict the search to a small portion of the database, 
resulting in an exhaustive "brute force" search methodology, particularly iwith 
multidimensional spaces greater than 20 dimensions. 

The dimen^onality performance issue hzs beqn addressed by Locality-Sensitive 
5 Hashing ("Approximate Nearest Neighbors: Towards Removing the Curse of 

Dimensionality," in Proc, 30* Symposium on Theory of Computing ( W98)), Standard 
similarity metrics, however^ such as Euclidean and Manhattan distance-based 
algoiidmis, cannot take full advantage of advantages of multidimensional near-neighbor 
seaicbing provided by Locality-Sensitive Hashing (LSHf) because they do not satisfy 
^ 10 certain properties exploited by LSH. 

p Further, comparison techniques used for images tend to be sensitive to common 

8 

Q transformations. Such comparison techniques may not be robust enough to detect a 

in 

^ match between two images that differ by a subtle geometric transformation, such as 

€ rotation, translation, or scaling. 

bi 

s IS Accoidmgly^ it would be beneficial to develop an efficient method for finding 

^ near-similar images which avoids an exhaustive search of all candidate data and whidi 

Q is resilient to minor ^metric tnuisfi)imation$ of similar images. 



SUMMARY OF THE INVENTION 

20 A method for storing and retrieving image data ftom a database having a 

plnrality of potential match images includes (0 computing a match descriptor 
corresponding to a multidimensional space indicative of each of the stored images, and 
(u) organizing each of the match descriptors according to a similarity metric. The 
similarity metric is employed to order match descriptors near to other match descriptors 

25 in the multidimensional space. A target image for which a match is sought is then 
received, and a target descriptor indicative of the target image is computed. The 
database is referenced^ or mapped, to detennme a close match to the target descriptor 
among the match descriptors in the database a close match being determined by a 
distance to ^ near matoh descriptor being within a prcdetcanined threshold. The 
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database mapping inclines selecting a neaxest^ndghbor candidate match descriptor from 
among the match descriptors in the database and employixig a distance metric derived 
firom the similarity metric to detjermine if the candidate match descriptor is a match to 
the target descriptor. 

5 The match descriptors are invariant descriptors derived either from a Fourier- 

Mellin Transform (FMT) or color histogram, for example^ which are generally 
iitsensitive to geometric transformations such as translation^ scahng, rotation, or 
common image processing such as compre^ioo, filtering. Such descriptors capture 
information about the images in the^ foxm of a set such that a set similarity metric, 

2 10 descnbed further below^ may be ^plied to detennine similarity or dissimilarity between 

3 images. 

The match descriptors denote a vector quantity m a multidimensional space and 
^ are stored in the database acconlin^y. Locality-Sensitive Hashing (LSH) is emptoyed 

Ui to organize, or order, the descriptors in the database in a nearest-neighbor manner such 

^ 15 (hat the descriptors conesponding to similar images are stored near each other in the 
^ database. By stormg the descriptors near descriptors corresponding to similar images^ 

^ existing matches are selected by the hashing^ rather than requiring a biuto-fi)rce search 

H> of all descriptors in the database. Therefore, LSH ordering allows a measure of 

similarity for matching to be apphed by examining only a fraction of the database. A 
20 distance metric derived from the similarity metric indicates descriptors ^ch are near 
matches to other descriptors in the multidimenrional space. Candidate matdi 
descriptors which are near the match descriptor are selected as the matching image or 
set of images responsive to a query. In alternate «nbodiments, additional searching 
from among near descriptors could also occur, followed by attanpts to match against 
25 more distant descriptors. 

In this manner, target images which have be^ subjected to geometric 
transformations may still be found because the invariant descriptors are insensitive to 
such transfomaations, and the search required need traverse only a subset of the database 
because the descriptors are stored near oth^-, similar descriptors vi^ LSH which 
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provides that candidme match descriptors more likely to produce a validmatch are tried 
before other less likely match candidates. 

BRIEF DESCRIPTION OF THE DRAWINGS 

The foregoing and other objects, features and advantages of the invention will be 

5 apparent j&om the following more particular description of preferred ernbodiments af . 

the invention, as illustrated in the accompanying drawings in which like reference 

charact^ refer to the same parts throughout the different views. The drawings are not 

necessarily to scale, emphasis instead being placed upon illustrating the principles of the 

^ invention. 
P 

P 10 Fig. I is a block diagram of the invention system as defined herein; 

m Figs. 2a and 2b are data flow diagrams of the population and retrieval phases^ 

^ te$pectively, of the present invention as defined beidn; 

hi Fig. 3 fllustrates Euclidean distance in a multidimensional space; 

f>& . Fig. 4 shows a flowchart oftfie invention system as defined herdn; 

Q 15 Fig. 5 is a graphical illustration of vector quantization as defined heiein; and 

^ Fig. 6 shows image partitiwiing for a set similarity metric of the prcfared 

1::^ embodiment. 



DETAILED DESCRIPTrON OP THE INVENTION 

A description of preferred embodiments of the invention follows. 

20 The image slwage and retrieval system of the present invention system employs 

two phases. A population phase stores image data in a database according to a 
similarity metric. The similarity m^c ensures that similar images are stored near other 
similar images in the multidimensional space defined by the database. Storing images 
near other images limits search traversals to a fraction of the database because the image 

25 sought is organized near similar images. A retrieval phase traverses (ht database and 
compares a target image to Ae stored images. The images are compared according to a 
similarity metric, which defines distance in the multidimensional space. Images which 
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are sufficiently near the target images are deemed to be a match. By organizing the 

database according to die similarity metric^ the search commences on rniages near the 

matching image, and successive match attempts will occur on near images such that 

only a fraction of the database need be mapped to find a matchmg image. 

5 Fig.l shows a block diagram of the system as defined herein. Referring to Fig. 

1, the image storage and retrieval system 10 includes a descriptor constructor 12, a 

similarity processor 14, a database 1 6, and a database mapper IS. The descriptor 

constructor 12 is employed to compute an invariant descriptor corresponding to a raw 

target image 20 for which a match is sought. 

10 The database mapper 1 8 finds a candidate match descriptor in the database 16 

O from among vectors which are near (he raw target image 20 desedptor which a 

8 

□ match is sought, described forth^ below. The similarity processor 14 ccxnpares the two 

in 

descriptors (the raw target image 20 descriptor and the candidate match descriptor fiom 
^ database 16) according to a similarity metric to determine if tfiere is a match. If the 

a 13 similarity processor 14 determines that tte two vectors are sufficiently similar, the 

ly matching image or imag^ 22 corresponding to the candidate match descriptor is 

@ returned. The matching image or images may have been transformed, such as scaled or 

P rotated, as shown by tiietai^t image 20 and matchmg image 22 in Fig. 1. ba 

particular embodiment, the locality-sensitive hashing employed in organizing the 
20 database 1 6 results in a nearest-first mapping that returns pertinent image or images. In 
the event that the mapping does not yield a workable set of im^^ges, additional candidate 
match descriptors may be selected by the database mapper 1 8 ftom among descriptors 
near tt^ previous candidate match descriptors in the database 16 and the foregoing 
similarity measure by similarity processor 14 may be rqjeated 
2S Fig. 2a shows a dataflow of the population phase. Referrmg to Fig- 2a, raw 

image data is gathered for inclusion in the database 16, as shown by arrow 24. The raw 
image data may be gathered ftom a variety of source^, such as thelnteraet 26, magnetic 
media 28, digital camera 30 via PC 32, or other sources. The raw image data 24 is used 
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to derive an invariant descriptor 34 corrcspondtng to the law image 24. The invariant 
descriptor 34 is a vector form of the data, such as statistics, in a multidimensional space. 

The invariant descriptor 34 may be computed by aFourier-Mellm Tiansfoim 
(FMT) 36y color histogram, or oflier method, and defines the image data 24 in temis of 
5 attributes. The attributes may be expressed as inclusion or exchision from a set, and 
therefore may be expressed in a boolean form which may be compared according to a 
set similarity metric^ described forther b elow. The use of an FMT has been employed 
with text, as described in BraceweJl, *The Fourier Transform and its Applications " 
McGraw-Hill, New York (1978). The result of the EMT can be further processed using 
^ 1 0 vector quantization, described further below, which piodiKjes output that is symbolic 
Q and is amenable to a set similarity metric. The iavariant descriptor 34 defines the image 

Q data 24 in temis which are resistant to typical geometric transfonnalioiis such as 

1^ rotation, translation, scaling, cropping or imag^ processing operations such as 

^ compression and filtering. In this manner, the invariant descriptor fi>rm may be used to 

9 15 compare images and detect matches of images which differ merely in size, orientation, 
fy scalmgoromissloa 

^ The invariant descriptor 34 is Aen organized in the database 16 according to 

B LSH38 usinga8imilaritymetric40. The similmty metric 40 is preferably a set 

similarity metric which orders the invariant descriptor 34 near other invariant 
20 descriptors already in flie database 16. The LSH 38 determined order is used to provide 
an ordered descriptor 42 to the database 16. Image data 24 is stored and organized in 
the database 16, theid)y producing a database 1 6 of image data descriptors organized in 
a multidimensional space in which descriptors corresponding to similar images are 
organized near each other in the multidimensional space. 
25 Fig. 2b shows a data flow of the retrieval phase. Referring to Fig 2b, target 

image data 44 for which a match is sought is provided. The target image data 44 
undergoes a FMT 3 6 to compute a targsit invariant descriptor 46 to compare against 
database entries. The target invariant descriptor 46 is preferably a vector quantity of the 
same dimensionality as the descriptors already stored in the database 1 6. A candidate 
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match descriptor 48 is provided from the database 1 6 fiom descnptois Ifaat are near to 



S 
Q 
Q 

m 



the target descriptor 46. A similarity metric employed In ordering the descriptors in the 
database 16 is employed to derive a distance metric 52, which is u^ed to deteimine the 
distance, or similarity 50 between the target descriptor 46 and the candidate match 
5 descriptor 48. Similarity is detennined by computing a distance, based on the distance 
metric 52, between the target candidate and match descriptors 46, 48 in the ^ 
multidimensional space defined by the database 16. If the vectors of descriptors 46, 48 
are similar^ as indicated by a small distance, than the respective candidate match 
descriptor 48 is returned as the match result 54* or a result of no match is the descriptors 
10 46, 48 ace not similar. Alternatively, if the distance metric 52 does not indicate a match 
between the descriptors 46, 48, then ano&er candidate match desoiptor 48 may selected 
fiom tfie near match descriptors in 'die database 16. 

Fig. 3 illusliates the notion of distance between vectors in a muMdimensional 
space according to a prior art Euclidean metric. The distance between vectors indicates 
IS fhe degree to which one vector is to another. In the example shown, a two dimensional 
vector space, often refened to as a Cartesian plane» is shown as illustrative, however, 
the invariant descriptors as described herein employ many more dimensions dependmg 
on the number of statistics employed by the similarity metric. Referring to Fig. 3, the 
two dimensional space 58 has an x axis 60 and a y axis 62. A fit$t vector is defined by 
20 xl, yl, and is shown as a point 64. A second vector, defined by x2y y2 is shown as a 
point 66* The distance d between the two points is shown by dotted line 68, and 
indicates the degree of similarity between the two vectors. As the number of 
dimensions included in a vector increase, graphical r^resentation becomes infeasible, 
however, the notion of distance employed herein as defined by the similarity metric 
.25 rranains. 

Note that the set similarity metric, as employed herein, differs fiom the 
trigonometric representation of vector distance as shown in Fi& 3 in that the vectors ai^ 
defined in terms of a set. A set defines elements in terms of a boolean relationship of 
nichision or exclusion firom the seL The LSH method employed to organize the 
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inventioD database 1 6 uses a $et similarity metnc» described ittrther below. Further^ the 
similarity metric defines the distance metric D$ed to detennine descriptors, or vectors, 
which are organized near oth^ vectors in the database 16. 

Fig. 4 shows a flowchart of descriptor ordering and database mapping in the 
5 preferred embodiment. Referring to Figs. 4, 2a, and 2b, raw data images 24 are 
gathered for population of the database 16, as shown at step 100. A transfonnadon- 
invariant descriptor 34 is computed for each image 24, as depicted at step 102. The 
transformation-invariant descriptor 34 is organized according to the similarity metric 
and stored in the database 16, as disclosed at step 104, A check is made to determine if 
10 any more images 24 remain for organizing tn the database 16, as shown at step 106« If 
there are naore images 24 £br organizing in the database, processing control leveEts to 
^ stq> 102. Oth^wise, flie database 16 is populated with ordered invariant descriptors 42, 

2 or match descriptors, as shovm at step 107. 

W A target image 44 is received &r matching against images (represented by 

^ 15 ordered descriptors 42) in the database 16, as dqiicted at stq) 108. An invariant 
ill 

Q descnptor corresponding to the target hnage 44, or target descriptor 46, is computed, as 

^ depicted at step 1 10. The target descriptor 46 is then employed to map into the database 

16, as disclosed at step 1 12, and select a candidate match desoiptor 48 from match 
descru)tors that are near the target descriptor 46, as shown at step 1 14. A check is 
20 performed employing the distance metric 52 to determine if the selected candidate 
match descriptor 48 is a match to the target descriptor 46, as depicted at step 1 16. A 
match occurs if the distance metric 52 indicates that the two invariant descriptors 46, 48 
are sufficiently near, or within a distance threshold, to be considered a match. The 
match descriptor 48 is returned if a match was found, as shown at step 118. 
25 Otherwise, a check is performed to determine if a search tentrination criteria, 

indicative of a failure to find a match, is perfonned, as shown at step 120. The search 
termination criteria may be a number of successive candidate mateh descriptors 48 
havir^ been compared, a candidate match descriptor beyond a certain distance, or a 
combination of a maximum distance and n^unber of iterations. If the search termination 
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criteria has been met. then no match exists in the database 16, and the seardi is 
concluded at stq> 122. Otherwise^ a new candidate match descriptor 48 is selected fiom 
among the near match descriptors, as shonwn at step 124. and control reverts to st^ 1 14. 
The set similarity metric 40 defines similarity between descriptots which define 
5 data images in tennsofinclusion or exclusion of attributes. An associated distance 
metric 52 quantifies the distance or degree of similarity, between such descriptors. The 
LSH 38 population of the database 16 employs such a set similarity metric. In a 
particular embodiment, images are subdivided into overlapping regions at various scales 
and positions, described finther below. For each region, certain statistics are computed 

10 which are robust to image transformations, such as an FMT ora color histogram of die 
regioB. Each region of the image, therefore, is represented as a tomsformation-mvariant 
descriptor of tfie image data. 

The set similarity metric is implied to order the database 16 and to determine the 
dif^ence D between two images. One such m^c is a s^ infiersection similari^ 

IS metric, as Mows. Given two descriptors A and B, the set similflri^ measure betwea^ 
A and fi is the ratio of the number of elements common to the two sets and the total 
number of unique elemmts in Oie two sets: 



Following is an example of die set intersection similarity metric applied to detomine 
the distance between sets. Given two sets of image data, A and B, invariant des<mptois 
are computed and compared to deteimine the distance. 

The image data is as follows: 



D(A,B)» 



|A n B| 



Eq.l 



20 



|AU B| 



25 



r)8taA«««hcllo there** 
DataB^'Tii there" . 
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The invariant descriptor is defined to be the presence of a particular character. Applying 
to the example image data results in the following invariant descriptois: 

. A={h,e,l,o,t,r} 
B = {h,i,t,h,e,r> 

5 Applying our similarity metric to determine the distance yields four elements conunon 
to both and seven total unique elements: 



u lAOBt {h,e,t.r} 4 

B — = - - =0-57 

I |AUB| ' {h,^l0Ar,i} 7 

U 10 It follows logically that the set similarity metric defines a value near 1 as a near match, 
^ and a value near 0 as a distant matcL 

The se( similarity metric and the resulting distance metric comparison is s^phed 
fU to visual imag^ data by defining a set ofstatistics which define an image mtox^ 

^ boolean relationships. The above example employs the pres^ce or absence of a letter 

C . IS as a boolean attribute of the sets. Other attributes may be employed. Further, the image 
partitioning employed breaks an image up into regions, each of which exhibits set 
attributes, illustrated further below. 

The statistics are gathered fix>m the data using image processing techniques such 
as an FMT, color histogram, or other method operable to define an image or region of 
20 an image in temis of an invariant descriptor. Both the FMT and color histograms have 
tfaevaluablepropeilyofresilienceto geometric trans&mialions. The color histogram is 
a typical representation employed in image processmg which may be ad^ted to a set 
similarity m^c as defined herein. A typical color histogram may have 256 bins, which 
will usually be too fine a granularity with which to define equality and inequality of 
25 vectors in a boolean manner applicable to sets. However^ vector quantization can be 
employed to clustet vectors and consider them to be equal if they are hi the same cluster. 
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Fig. . 5 shows an example of vector quantization. Vector quantization allows 
representation of numeric information^ such as that contained wiQiin an invariant 
descriptor^ in a symbolic way. Both FMTs and color histograms produce numeric 
ou^ut which is transfonned to a symbolic representation, such as by vector 
5 quantization, for use with LSH. Referring to Fig. 5, a two dimensional space is shown. 
In the actual implementation, more dimensions would be employed, A typical color 
histogram may employ 64 dimensions, for example, however the two dimensions shown 
are intended as illustrative. An x axis 200 and a y axis 202 define a multidimensional 
^ace 204. A collection of four vectors are illustrated^ aijd shown by clusters of points 

10 defined by circles 212a - 212d. In the example given above, the alphanumeric 

characters embodied in the invariant descriptors could be granularized to fonn groups of 
related vectors. Each cluster of points, for example the cluster 2 12a, defines a vector 
near the vector defining an ideal A, shown by point 214a. Each of ihe other clusters 
212b - 2I2d is likewise defined around an ideal vector 214b-214d, respectively. Vector 

IS 216, being near to the ideal A vector 214a, wo^d be considered part of the cluster 212& 
Inclusion or exclusion of vectors in certain groups maybe tufted to give i^tive weights 
to attributes, for example the lines 212a-212d defining quantized groups need not 
necessarily define drcular boundaries. 



similarity metric exhibit boolean characteristics^ the image partitiomng denotes regions 
having the presence or absence of a particular attribute. Referring to Fig. 6, an image 
220 is subdivided into a 3 by 3 grid of regions, denoted by x axis 222 and y axia 224, 
25 Each of the nine regions (x,y) has the indicated attribute A, B, C or D, and thus the . 
absence of the remaining attributes A-D. Therefore, invariant descriptors defining each 
of the regions are as follows in Table I whetrin absence of an attribute is designated 
. with an overhead bar notation: v 



As indicated above, image partitioning is employed to subdivide an image into 
20 regions of salient features. Fig. 6 shows an example of hnage partitioning as employed 
to define the invariant descriptors. Since the invariant descriptors employed by the set 
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0 



10 



(x,y): {Descriptor} 

(1.1) :{A»B,C.b} 

(1.2) :{A,B,C,D} 

(1.3) :(A,B,C,D} 

(2.1) :{A,B,C,D} 

(2.2) :{A,B,C.D} 

(2.3) ;{A,B.C.D} 
□ IS (3,1) :{A.B.C.D} 

§ (3.2) :{A,B.C,P} 

(R 

J* (33) :{A,B.CD} 

y 20 TABLEI 

B 

J* 

!U 

Q Alternatively, a paititioning scheme may focus on certain salient features in the image. 



since certain regions may contain more usefiil information than others. Interesting local 
fi features are considered, such a$ comers and highly-textured patches and scaled 

^propriately to distinguish the content, while static regions of little variance mi^ be 
25 considered more broadly. 

Representing image descriptors wifli vectors formed of the foregoing set 
elements is key to the preferred embodiment Such set descriptors are en^loyed at 34, 
42 in Fig. 2a in populating the database 16 and at 46, 48 in Fig. 2b in finding matches 
fiom database 16. The set shnilarity between descriptors 46, 48 is measured, 
30 (calculated) byEq, 1 as the similarity distances at 52 in Fig. 2b. Suchusecf set theory 
m combination with vector quantization and LSH technigues m determining near 
similar images in a database aUows the preset invention to be efSeient and 
advantageous over tiie prior ait 



PAGE17/39'RCVDAT11i21/200612:01:36PM[EastemW 



. .-NOV. 21. 2006 8:57AM 




iHP LEGAL 




NO. 3015 P. 18 



-13- 



Q 
Q 
Q 

m 

UP 

ru 

9 



Those skilled in the art should readily appreciate that (he programs for storing 
and retrieving image data as defined herein are deliverable to a computer in many forms, 
including but not limited to a) information pmnanently stored on non-writeable storage 
media such as ROM devices, b) informatiDn alterably stored on writeable storage media 
5 such as floppy disks, magnetic tapes, CDs, RAM devices, and other magnetic and 
optical media, or c) information conveyed to a computer through communication media, 
for example using baseband signaling or broadband signaling techniques, as in an 
electronic network such as the bteraet or telephone modem lines. The operations and 
methods may be implemented in a software executable by a processor or as a set of 
10 instructions embedded in a carrier wave. Alternatively, the operations and methods may 
be embodied in whole or in part using hardware components, such as Application 
Specific Integrated Circuits (ASICs), state machines, controllers or other hardware 
components or devices, or a combination of hardware, software^ and firmware 
components. 

IS While this invention has been particularly shown and described with references 

to piefened onbodim^ thereof, it will be understood by those skilled m the art dmt 
various changes in form and details maybe made therein without departing firom the 
scope of the invention encompassed by the appended claims. Accordingly, the 
inv^on is not intended to be limited except as defined by the following claims. 
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What is claimed is; 

I . A mefliod of storing and ordering image data in a database comprising: 
gathering a plurality of images for inclusion in the database; 
5 computing a match descriptor indicative of each of the plurality of 

images, each of the match descriptors corresponding to a multidimensional 
space; and 

1^ oiganizing the match descriptors in the database, the organizing being 

^ perfoimed according to a predetemxined metric indicative of a correspondence 

0 10 between a given match descriptor and the oUi» match descriptors in tbe 

^ database. 

y 

« 2. The method of clum 1 wherein a match descriptor is a vector quantity. 

I 

^ 3. The method of clahn 2 ivfaerein the correspondence is a similarity of the match 

Q descriptors. 

1 5 4. The method of claim 1 wherein the piedetemiined metric is a distance metric. 

5. The method of claim 4 wherein the distance metric is dmved from a similarity 
• metric, the similarity metric operable to determine match descriptors near to 

other match descriptors based on a distance in the multidimensional space. 

6. The method of claim 1 wherein computing the match descriptor includes 
20 computing a Fourier-Mellin Transform (FMT). 

7. The mediod of claim 6 furth^ comprising vector quantization of the FMT. 
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8. The method of claim 1 wh^ein the matdi descriptors are invariant descriptors. 

9. The method of claim S wherein the invariant descriptors are insensitive to 



a 



10. The mefhod t)f claim 1 wherein the organizing according to a predetermined 
5 metric further comprises Locality-Sensitive Hashing (LSH)^ 

1 1. Amethod for finding images in a database comprising: 

providing a database of stored imagos, each of the stored images 

^ represented by a matdi descriptor^ the match descriptors corresponding to a 
Q 

in multidimensional space and org^izM according to a similarity metric in the 

^ 10 multidimoisional space; 

^ providmg a target image for which a match is so^t in ttie database 

k computing a target descriptor coiresponding to the taiget imag^ 

fU 

q selecting a candidate match descriptor fiom the organized match 

£ desciiptois in the database; 

h IS computing, according to the similarity metric, a distance between the 



candidate match descriptor and the target descriptor; and 

returning, as the matching image, the candidate match descriptor if the 
computed distance firom the candidate match descriptor to the target descriptor is 
within a predetermined distance threshold. 

20 12. The method of claim 1 1 fiirth^ comprismg returning if a null search criteria is 
satisfied. 

13. The method of claim 1 1 fiuther comprising selecting a successive candidate 
match descriptor, each of the successive candidate match descriptors computed 
fiom match descriptors near the fomier candidate match descriptors. 
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1 4. The method of claim 1 1 wherein the selecfing of the database is a nearest- 
neighbor mapping. 

15. The method of claim 1 1 wherein the predetermined distance threshold is 
satisfied if the similarity metric indicates that the candidate match descriptor i$ 

5 sufficiently near the target descriptor in the multidimensional space. 

16. The method of claim 12 wherein the null search criteria further comprises a 
predetemiined maximum distance. 

E > 

o 

^ 17. The method of claim 1 1 wherein the stored images fiirtfaer comprise visual 

S imagcdata. 

ut 10 18. A method for stormg and retrieving image data comprising: 
' providing a plurality of match images; 

ni computing a match desmptor corresponding to a multidim^sional space 

Q 

^ indicative ofeach of the matdi images; 

P . organizing each of the matdi descriptors in a database according to a 

15 predetermined similarity metric, the similarity metric operable to indicate match 

descriptors that are near to other match descriptors in the multidimensional 
space; 

receiving a target image for which a match is sought; 
computing a target descriptor indicative of the target image; 
20 mapping into the database to detennine a close match of the target 

descriptor among the organized match descriptors, a close match determined by 
a distance to a near match descriptor within a predetennmed threshold, the 
m^ing fiir&er comprising: 

selecting a candidate match descriptor from among the organized 
2S match descriptOT?; and 
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returning the candidate matdi descriptor if the candidate match 
descriptor is a match to the target descriptor, the match being detennined 
by a similarity metric, 

1 9. The method of claim 1 8 further comprising selecting another candidate match 
descriptor if the candidate match descriptor is not a match to the target 
descriptor, the selecting occurring &om among match descriptors oiganized near 
the candidate match descriptors. 



20. The method of claim 1 8 wherein near match descriptors are dmilar vectors in 

Q the multidmiensional space. 

Q 

C 

^ 10 21. Themethodofclaim ISwhereintheshmlaritymetricisasetsin^ 

^ 22. A system for finding stored images comprising: 

^ a database of stored images^ each of the stored images represented by a 

I 



match descriptor^ the match descriptors oorre^onding to a nmltidim^isional 



Q space and orgaiiized according to a similaritymetrio in the multidimensional 

^ IS space; 

a descriptor constructor operable to compute a target descriptor 
corresponding to a target image for which a match is sought m the database; 

a database mapper operable to select a candidate match descriptor fiom 
mapping near match descriptors in the database; and 
20 a similarity processor operable to compute, according to the similarity 

metric, a distance between the candidate match descriptor and the target 
descriptor, and farther operable to return, as a matching descriptor, the candidate 
match descriptor if Oie computed distance to the target descriptor is within a 
predetermined distance threshold. 



PA(X22l39*RCVDAT11f21/200612:01:3SPM [Eastern StM 



—NOV. 21. 2006 8:58AM ^HP LEGAL 



NO. 3015 P. 23 



-18- 

23. The method of claim 22 wherein the simflarity processor is fiirther operable to 
tetminate the search if a null search criteria is sati$fied 

24. The system of claim 22 wherein the match descriptors and the target descriptor 
are vector quantities. 

5 25. The system of claim 22 wherein the similarity processor is fiirther operable to 
compute the distance ba^ed on the similarity in the magnitude of the match 
descriptors and the target descriptor. 

26. The system of claim 22 wherein the descriptor constructor computes the target 
descriptors by computing a Fourier-Mcllm Transform (FMT), 

10 27. The method of claim 26 wherein the descriptor constructor is finth^ opmble to 
det^ine vector quantization of the FMT. 

25. The system of claim 22 wherein the similarity metric furOier comprises Locality* 
Sensitive Hashing (LSH). 

29. The system of claim 22 wherein the match descriptors are invariant descriptors. 

15 30. The system of claim 29 wherein the invariant descriptors are insensitive to 
geometric translations. 

3 1 . The system of claim 22 wherein the predetermined distance threshold is satisfied 
if the similarity metric indicates that the candidate march descriptor is 
sujfficiently near the target descriptor in the multidimensional space. 
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32. The system of claim 23 wherein the null search criteria furdier comprises a 
predetermined maximum distance. 

33. A computer program product having computer program code for finding images 
in a database comprising: 

5 computer program code for accessing a database of stored images, each 

of the stored images represented by a match descriptor, the match descriptors 
corresponding to a multidimensional space and organized according to a 
similarity metric in the muhidim^ional space; 

Q computer program code for providing a target image for which a match is 

§ 10 sought in the database; 

iH computer program code fi>r coinpttting a target descr^tor conresponding 

^ to the target image; 

^ computer program code for selecting a candiddte match descriptor fiom 

the match descriptors in the database; 

m 

Q IS computer program code for computing, according to the similarity 

^ metric, a distance between flie candidate match descriptor and the target 

descriptor; and 

compute program code for retuming, as the matching image^ the 
candidate match descriptor if the distance fiom the candidate match descriptor to 
20 the target descriptor is within a predetennined distance threshold. 

34- A computer data signal for finding images in a database comprising: 

program code for accessing a database of stored images, each of the 
stored images represented by a match descriptor, the match descriptors 
25 correspondmg to a multidimensional space and organized according to a 

similarity metric in the multidimensional space; 

program code for providing a target image for which a match is sought in 
the database; 
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program code for computing a target descriptor corresponding to the 
target image; 

program code for selecting a candidate match descriptor fiom the match 
descriptors in the database; 
5 prograin code for computing, according to the similarity metric, a 

distance between the candidate match descriptor and the target descriptor; and 

program code for returning, as the matching image, the candidate match 
descriptor if the distance fiom the candidate match descriptor to the target 
descriptor is withm a piedetemiffled distance threshold. 

t± 10 
Q 

P 35. A system for finding stored images comprisuig: 

^ means for accessing a database of stored unages, each of the stored 

^ images represented by a match descriptor, (he match descr^tors corresponding 

y to a multidimensional space and organized according to a similarity metric in the 

^ 15 multidimemfiional spac^ 

^ means for pzoviding a target image for whi(^ a matdi is sought in 

Q 

^ database; 
S 

^ means for computing a target descriptor corresponding to the target 

image; 

20 means for selecting a candidate match descriptor from the match 

descriptors in the database; 

means for computing, according to the similarity metric, a distance 
between the candidate match descriptor and the target descriptor, and 
means for returtimg, as the matching image, the candidate match 
25 descriptor if die distance from the candidate match descriptor to die target 

descriptor is within a predetermined distance threshold. 
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SYSTEM AND METHOD FOR EFFICIENTLY FINDING NEAR-SlMlLAR 
IMAGES IN MASSIVE DATABASES 



Massive amounts of multiinedia data are stored in databases sappoitmg web 
pages and servers, including text, graphics, video and audio. Searching and finding 
matching multimedia images can be time and computati onally tntensive. A method &r 
storing and retrieving image data includes cotx^uting a descriptor, such an a Fouri^- 
■ Me]lin Transform (FMT), corresponding to a muitidimeosional ^ace indicative of each 
of the stored images and organizing each of the desoiptors according to a set similarity 
metric. The set similarity metric is based on Locality-Soisitive Bashing (LSH)» and 
orders descriptors near to other descriptors m ttie database. The set similarity metric 
employs set theory "ivhich allows distance bet^en descriptors to be computed 
consistent with LSH. A target image for which a match is sought is then received, and a 
descriptor indicative ofthe target image is computed. The database is referenced, or 
mapped^ to determine close matciies in the database. Mapping includes selecting a 
candidate match descriptor fiiom among the descriptors in the database and employing a 
distance metric derived from the similarity metric to deteiuiine if the candidate matdi 
descriptor is a match to the target descriptor. 



ABSTRACT OF THE DISCLOSURE 
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As a Wow named milticJ^fS^y declare thai: my residence, post office address, and citizenship are as stated below nexi to niy name. I believe 1 
am the origiDal, fir^, and sole mventor (if oi\Jy one name listed below) or a joint inventor (if plural inventors are listed below) of the subject maiter 
which is claimed and for which e patent is sought on the invention entitled: System and Method for Effidcntlv Finding Near-SimiTar Images in 
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as dtscribcd in the specificaiion [ J attached or [X] of pfltent Application Serial No . 1O/D0SJ93 ^ ^ 

filed _December_4,2CKIt and amended on . 

1 hereby siate that I have reviewed and undersood the contents of ihe above identified Specification^ including the olimns, as amended by any 
amendment referred to above; tbar I do not know afid do not believe the same was ever knov^n Or used in the United States of America before my or our 
invention thereof or patented or described in any primed poblication in any country beforemy Or COT invention thereof or nvorc than one year prior to 
this-appltcation; that the invention has itol.been patented or made the subject of an invcmor's certificate issued before die date of this applican'on in any 
countvy foitign to the United States of America on an application filed by me or my legal representative or assigns more than twelve moilths prior (o 
(his application; and that I acknowledge die duty to disclose information of which J am aware which is maicrial to the examination of diis application in 
accordance with Title 37, Code of Federal Regu lations § 1 J6(a). Sach infonnatioa is material when it is not amiubh've to information already of 
lecQid or being node ofrecord in the application^ and 
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(i) opposing an argimient of unpatentability rcfied on by the OfBce; tir 

(ii) asserting an argument of palentabi jity. 
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fifing date of this application: 
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DECLARATION 



NO. 3015 P. 36 



SOLE/JOINT INVENTOR 
ORICINAL/SUBSTITUTE/CIP 



As a below named m vd)i^^ jierctu^jjl^^c than my residence^ posr ofBcc address, and cfcizenship ait staged below next ra my name. I bdievc I 
am the original, first, arni sOTnnSiBwr(if only one name is listed below) or a joinE inventor (if plural invenrors listed below) of the subject matter 
which is claimed and for which a parent is sought on the Cnvendon entitled : System and Mcihod for Efiftdently Fitiding Near^imilar Images in 

Massive DataKac<»c 



as described In the specification ( ] attached or pC] ofpatent Application Serial No . 10/005.193 
nicd December 4> 2001 and amended on , 



I hereby state fhU I have reviewed and understand the contents of the above identified Specification^ including the claims^ as amended by any 
amendment referred to abov^-that I do not know and do not believe the same was ever \cnawii gr a$ed in the United Stales of America before my or our 
invendon thereof, or patented or described in any printed publication in any country before my or our invention (hereof or more than one year prior to 
this application; that die I nvention has not been patented or made the subject of an inventor's ceitificare \ssmd before die date of thi$ application lit any 
country foreign to the United States of America on an applicatibn filed by me or iny legal representative or assigns more than t*ftWt moutha prior to 
this appGcalioi^ and that I acknowledge the duty to disclose mfontiation of which I am aware which fs material to the estamination of this application In 
accordance with Title 37, Code ofFedera) Regulations § t .56(aX Such fnfonnation is materia] when it is not cumulah've to information already of 
record or being made of record in the application, and 

RiOEIVED 

(1) it establishes, by Itself or in combioat/Od with other infoimation, a prima facie case of unpatentability of a claim; or #*bmto a tm 

(2) it iduEcs, or is inconsistent with, a position the applicant has tskm or may take in: CENTRAL FAX CENTER 



(i) opposing an argument of unpatentability relied oo by the Office, or 

(ii) asserting an aigoment of patentahi lity. 



NOV 2 1 2006 



f hereby claim foreign priority benefits under Title 35, United States Code § ') 1 9 of any foreign ai^lieationCs) for patent or tnvento/'s certificates listed 
below and have also identiffed below any foreign appIiC8tron(s) having a filing date before that of die applic2Qon(s) on which pdority is claimed: 



COUNTRY 


APPUCATION NUMBER 


DATEOFFIUNG 


PRfORTTY CLAIMED UNDER 








35US.C1I9 








[ ]YES [ JNO 



f faeneby claim the bowfit under 3$ US.C S 1 1 9(e) of any United States provisional applicatioi<s) listed below. 



(Application Nurtiiber) 



(FDmgDatc) 



(Application Number) 



(Filing Date) 



I hereby claim die benefit under Title 35 United States Code § 120 of any United States application(s) listed below and, insofar as any subject matter of 
any claini of this application is not disclosed in the prior United States Application, J acknowledge the duty to djsdo^ material infonnation as defined 
in Title 17, Code of Federal Regutatjbns § l-^a) whicfa occuned between the filing date of flie prior applicaiion and the national PCT international 
filing date of this application: 



I hereby declare that all statements made herein of my own knowledge are true and that ail statements made on fnformaiion and belief arc believed to be 
true; and funhcr that these statements were made with dte knowledge that wiUfiii f^staienoents and the like so node are punishable by fine or 
imprisonment, or both, under Section 1001 of Title 1 g of the United States Code and that such willful ^Ise statements may jeopardize the validity of 
the application or any patent issued thereon- 



FULL NAME OF SOLE OR FIRST INVENTOR 
Trista P. Chen 


INVENTOR'S SIGNATURE 


DATE 


RESIDENCE 

900 S.NcgIcy Avenue, Apt. 17, Pittsbiiigh. PA 15232 


CITIZENSHIP 
Taiwan 


PO$T OFFICE ADDRESS 
Same as above 


FULL NAME OF SECOND JOINT INVENTOR 

T.M. t^uralj 


INVENTOR'S SIGNATURE 


DATE 


RESIDENCE 

46 Pemberton Street, Cambridge, MA 0214O " 


CITIZENSHIP 
India 


POST OFFICE ADDRESS 
Same as above 
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From: Lehmann» Eileen 

Sent: Friday, November 1 0. 2006 4c56 PM 

To: 'rnurali@cs-vtedu' 

Subject IMPORTANT: Name Verification Declaration for Patent Application 
Attachments: Bc_D.pdf; 200^02024-1 Name Verification Statementdoc; Ex_A.pdi; Ex_B.pdf 

Dear Dt. Muxali: 

As per our telephone conversation today, we need you to verify your full first name, middle initial and last name for the 
XJ.S. Patent and Trademark Office for U.S. patent application no. 10/005,193. Without your verification, the IJSPTO 
will not allow the patent application to issue as a patent Please review the attached Declaration and Exhibits. If yon 
need to make any changes to the declaration, please advise. If all is in order, please print out the declaration, sign and 
date it and return it to me as a PDF in an e-mail or fax (650) 852-8063. 

Please return by November 15, 2006- 

Background: The declaration (Exhibit B) for this appUcation was executed as T.M. Mu^ After HP had paid flie 
issue fee, the Examiner contacted me and pointed out ftiat the declaration did not include at least one one full &st or 
middle name as required. 

If you have any questions or comments^ please do not hesitate to call me. 

Sincerely, 
Eileen Lehmann 
Counsel, IP Section 
Hewlett-Packard Company 
Mail Stop 1197 
1501 Page Mill Road 
Palo Alto, CA 94304 
Telephone; (650) 857-7940 
Facsimile; (650) 852-8063 
ciIeeT3Jehm(m@hpx_^ 

Exhibit C 
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NO. 3015 P. 39 



Notice of Action 




July 21, id9$ 



1 o£ 1 



COMPAQ COHPOTJSa 
C/O S'JSAW LAMl>J£:t>EC:C:ii2L PK03-5/31K 
129 PARKER STRKKT 



ck!^\H 1120 

PKTITIOW FOR A SWiMMlGRANT WOR[CER 
rt!nTiONVi2i ~— — 

COMPAQ COMr'UTEfr 



MUPXt*!, THIRUVAJATMARUTlfER M. 



Notice Typet Approval Not.-:,-^ 
Class: HlBl 

valid from i ft/01 /)f}95 to o? 30/20O2 



I- • -•• . i ' ".f* - ■ . -t I..- , , 

*' *: I - •* t* : • T.r : V • - 

• * , • • ... : , i. r r 



= i-' • • . ..: J. : - .j; m* I i ! . r : 

-I t»j t:-* -.-.i.-Jit i rju- 1 i-i,: ,1. :- 

; . A/}p/fr.i9trM} htf At, xtQn c/// ^, Appfo^fcd Apfit z^tftm- or Pontfoo. ^: • w , A 



Please Ojc iulduaonal infonnaiKiQ on UlC baclc. You will be notitlca scpawwjiv al>oui any oihcr caiC!» vou filed 
IMKIGRATION & NATURALI^TIQM iJRRVlCE 
VERMONT SERVICE CENTER 
y'> LOWER WELDEN STREET 
SAINt" AlkRANS VT 054 7?-:00l 

l-Ofin I7*>7A (Rev. OS>/Q7/93>N ^ 





Detach This Half for Pcreona^ Rcconls 

Receipt # 

NAME 
CLASS Hisi 

VALID PROM I0/Ol/1$99 t.VriU 09/30/2002 
PETITION fiR: COtAPAQ COMPLTfirft 

12^ PARKKR STREET 
MAYNARO MA Cl^S* 



T 



I Receipt Number i( 

I Imraigraiion and 

^ Naturalixation Service 

I 1-94 

i Departure Record 



Petitioner: cjomoaq c;;«put 



MUftALZ 



TKTRUV^AXMARUTIIER 
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This Page is Inserted by IFW Indexing and Scanning 
Operations and is not part of the Official Record 



Defective images within tiiis document are accurate representations of the original 
documents submitted by the applicant. 

Defects in the images include but are not limited to the items checked: 

□ black BORDERS 

□ IMAGE CUT OFF AT TOP, BOTTOM OR SIDES 

□ FADED TEXT OR DRAWING 

□ BLURRED OR ILLEGIBLE TEXT OR DRAWING 

r. / ■ 

□ SKEWED/SLANTED IMAGES 



^ d LINES OR MARKS ON ORIGINAL DOCUMENT 

□ R£FER£NCE(S) OR EXHIBIT(S) SUBMITTED ARE POOR QUALITY 

□ OTHER: ■ ' 

IMAGES ARE BEST AVAILABLE COPY. 
As rescanning these documents will not correct the image 
problems checked, please do not report these problems to 
the IFW Image Problem Mailbox. 



BEST AVAILABLE IMAGES 




COLOR OR BLACK AND WHITE PHOTOGRAPHS 




□/gray scale documents 



